Stacked Long-Term TDNN for Spoken Language Recognition

نویسندگان

  • Daniel Garcia-Romero
  • Alan McCree
چکیده

This paper introduces a stacked architecture that uses a time delay neural network (TDNN) to model long-term patterns for spoken language identification. The first component of the architecture is a feed-forward neural network with a bottleneck layer that is trained to classify context-dependent phone states (senones). The second component is a TDNN that takes the output of the bottleneck, concatenated over a long time span, and produces a posterior probability over the set of languages. The use of a TDNN architecture provides an efficient model to capture discriminative patterns over a wide temporal context. Experimental results are presented using the audio data from the language i-vector challenge (IVC) recently organized by NIST. The proposed system outperforms a state-of-the-art shifted delta cepstra i-vector system and provides complementary information to fuse with the new generation of bottleneckbased i-vector systems that model short-term dependencies.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Phoneme-level speech and natural language intergration for agglutinative languages

A new tightly coupled speech and natural language integration model is presented for a TDNN-based large vocabulary continuous speech recognition system. Unlike the popular n-best techniques developed for integrating mainly HMM-based speech and natural language systems in word level, which is obviously inadequate for the morphologically complex agglutinative languages, our model constructs a spo...

متن کامل

Spoken Term Detection for Persian News of Islamic Republic of Iran Broadcasting

Islamic Republic of Iran Broadcasting (IRIB) as one of the biggest broadcasting organizations, produces thousands of hours of media content daily. Accordingly, the IRIBchr('39')s archive is one of the richest archives in Iran containing a huge amount of multimedia data. Monitoring this massive volume of data, and brows and retrieval of this archive is one of the key issues for this broadcasting...

متن کامل

Integrated speech and morphological processing in a connectionist continuous speech understanding for Korean

A new tightly coupled speech and natural language integration model is presented for a TDNN-based continuous possibly large vocabulary speech recognition system for Korean. Unlike popular n-best techniques developed for integrating mainly HMM-based speech recognition and natural language processing in a word level, which is obviously inadequate for morphologically complex agglutinative language...

متن کامل

Integrating connectionist, statistical and symbolic approaches for continuous spoken Korean processing

This paper presents a multi-strategic and hybrid approach for large-scale integrated speech and natural language processing, employing connectionist, statistical and symbolic techniques. The developed spoken Korean processing engine (SKOPE) integrates connectionist TDNN-based phoneme recognition technique with statistical Viterbi-based lexical decoding and symbolic morphological/phonological an...

متن کامل

Recognition of spelled names over the telephone

Recognition of spelled names over the telephone line is essential for applications such as telephone directory assistance, or automatic mail ordering. We present recognition results on the spelling section of the OGI Spelled and Spoken Word Telephone Corpus, using a Multi-State Time Delay Neural Network (MS-TDNN). Many applications allow for strong language modeling constraints. In our experime...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016